Samples
1000
Total cohort size
Features
21
Predictor variables
Imbalance ratio
1.000
minority / majority
Target entropy
1.000
bits (Shannon)

Class Balance & Missingness

Class Distribution
Class distribution
Missingness per Column
Missingness
Feature Correlations
Correlation heatmap
Numeric Distributions
Numeric histograms
Distributions by Class
By-class histograms

Table 1 — Cohort Characteristics

FeatureTypeMissing_%Overall_meanOverall_sdClass0_meanClass0_sdClass1_meanClass1_sd
Haemoglobinnumeric0.012.2412.18113.9831.22810.4991.392
Haematocritnumeric0.037.126.02942.0152.94132.2254.014
MCVnumeric0.096.9829.07889.7054.758104.266.018
MCHnumeric0.031.9922.83929.9732.10834.011.878
MCHCnumeric0.032.9971.36433.9511.0132.0440.94
RDWnumeric0.015.0382.5513.0391.03617.0361.986
WBCnumeric0.05.2691.8626.5751.4193.9631.229
Plateletsnumeric0.0212.6466.215247.39748.392177.88463.364
Serum_B12numeric0.0281.397149.936399.019120.709163.77452.063
Folatenumeric0.08.5712.848.9853.0348.1572.567
Methylmalonic_Acidnumeric0.0474.957249.058251.1360.481698.784141.873
Homocysteinenumeric0.020.03910.36112.0554.2928.0238.293

Showing first 12 rows. Full CSV: eda_table1.csv

Group Comparisons vs. Target

FeatureTypeTestStatisticPValue
Methylmalonic_AcidnumericWelch t-64.903472202124562.800092021824305e-292
HaematocritnumericWelch t43.9930632384841765.569903702600216e-228
HaemoglobinnumericWelch t41.962628582681732.5578108938549414e-221
MCVnumericWelch t-42.421453330463773.0982816336581425e-221
LDHnumericWelch t-42.481690631746023.6385036972064432e-199
RDWnumericWelch t-39.900851546592098.047338959245091e-188
Serum_B12numericWelch t40.014680543009141.0307855038226757e-180
HomocysteinenumericWelch t-38.241185892034473.499258365804435e-178
MCHnumericWelch t-31.981287044876071.6926867353276502e-154
WBCnumericWelch t31.118131737110182.5155083314066547e-148
MCHCnumericWelch t30.9046181237618041.7963876643700358e-147
BilirubinnumericWelch t-23.6822443631925037.412536433946976e-93

Showing first 12 rows. Full CSV: eda_group_tests.csv

Best Model
LOGISTIC
Top-ranked by LogLoss (↓)
Best LogLoss
0.0019
Primary selection metric
Best AUC
1.000
Area Under ROC
Best Precision / Recall
1.000 / 1.000
Positive class quality

Grid Search Diagnostics

Grid Search Overview
Overview of per-model grid-search panels (log loss; lower is better). Shaded bands indicate 95% CI across settings of other hyperparameters.
LOGISTIC — Grid Search (log loss)
Grid LOGISTIC
Raw CV results: gridsearch_results_logistic.csv
RF — Grid Search (log loss)
Grid RF
Raw CV results: gridsearch_results_rf.csv
XGB — Grid Search (log loss)
Grid XGB
Raw CV results: gridsearch_results_xgb.csv
SVM — Grid Search (log loss)
Grid SVM
Raw CV results: gridsearch_results_svm.csv
Show only top 1
Model LogLoss (↓) AUC Accuracy Precision Recall Best CV LogLoss
LOGISTIC 0.0019 1.000 1.000 1.000 1.000 0.0009
XGB 0.0042 1.000 1.000 1.000 1.000 0.0046
SVM 0.0074 1.000 1.000 1.000 1.000 0.0071
RF 0.0155 1.000 1.000 1.000 1.000 0.0182

Model Visualisations

LOGISTIC

LogLoss: 0.0019 AUC: 1.000 Precision: 1.000 Recall: 1.000
Best params: {"C": 0.1}
Confusion Matrix
confusion_matrix_LOGISTIC
Feature Importance
feature_importance_LOGISTIC
Roc Pr Curve
roc_pr_curve_LOGISTIC
Shap Summary
shap_summary_LOGISTIC

XGB

LogLoss: 0.0042 AUC: 1.000 Precision: 1.000 Recall: 1.000
Best params: {"learning_rate": 0.3, "max_depth": 3, "n_estimators": 400}
Confusion Matrix
confusion_matrix_XGB
Feature Importance
feature_importance_XGB
Roc Pr Curve
roc_pr_curve_XGB
Shap Summary
shap_summary_XGB

SVM

LogLoss: 0.0074 AUC: 1.000 Precision: 1.000 Recall: 1.000
Best params: {"C": 100, "gamma": "scale", "kernel": "rbf"}
Confusion Matrix
confusion_matrix_SVM
Feature Importance
feature_importance_SVM
Roc Pr Curve
roc_pr_curve_SVM
Shap Summary
shap_summary_SVM

RF

LogLoss: 0.0155 AUC: 1.000 Precision: 1.000 Recall: 1.000
Best params: {"max_depth": 10, "n_estimators": 400}
Confusion Matrix
confusion_matrix_RF
Feature Importance
feature_importance_RF
Roc Pr Curve
roc_pr_curve_RF
Shap Summary
shap_summary_RF